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There  are  a  growing  number  of  level  4  (L4;  gap-free  gridded)  sea  surface  temperature  (SST)  products 
generated  by  blending  SST  data  from  various  sources  which  are  available  for  use  in  a  wide  variety  of 
operational  and  scientific  applications.  In  most  cases,  each  product  has  been  developed  for  a  specific 
user  community  with  specific  requirements  guiding  the  design  of  the  product.  Consequently 
differences  between  products  are  implicit  In  addition,  anomalous  atmospheric  conditions,  satellite 
operations  and  production  anomalies  may  occur  which  can  introduce  additional  differences.  This  paper 
describes  a  new  web-based  system  called  the  L4  SST  Quality  Monitor  (L4-SQUAM)  developed  to 
monitor  the  quality  of  L4  SST  products. 

L4-SQUAM  Intercompares  thirteen  L4  products  with  1-day  latency  in  an  operational  environment 
serving  the  needs  of  both  L4  SST  product  users  and  producers.  Relative  differences  between  products 
are  computed  and  visualized  using  maps,  histograms,  time  series  plots  and  HovmOller  diagrams,  for  all 
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combinations  of  products.  In  addition,  products  arc  compared  to  quality  controlled  in  stfu  SST  data 
(available  from  the  in  situ  SST  Quality  Monitor,  iQUAM.  companion  system)  In  a  consistent  manner. 
A  full  history  of  products  statistics  Is  retained  In  L4-SQUAM  for  time  series  analysis.  L4-SQUAM 
complements  the  two  other  Group  for  High  Resolution  SST  (CHRSST)  tools,  the  CHRSST  Multi  Product 
Ensemble  (GMPE)  and  the  High  Resolution  Diagnostic  Data  Set  (HRDDS)  systems,  documented  in  part 
1  of  this  paper  and  elsewhere,  respectively. 

Our  results  reveal  significant  differences  between  SST  products  In  coastal  and  open  ocean  areas. 
Differences  of  >  2  JC  are  often  observed  at  high  latitudes  partly  due  to  different  treatment  of  the  sea- 
ice  transition  zone.  Thus  when  an  ice  flag  is  available,  the  intercomparisons  are  performed  in  two  ways; 
including  and  excluding  Ice-flagged  grid  points.  Such  differences  are  significant  and  call  for  a 
community  effort  to  understand  their  root  cause  and  ensure  consistency  between  SST  products.  Future 
work  focuses  on  including  the  remaining  dally  L4  SST  products,  accommodating  for  newer  14  SSTs 
which  resolve  the  diurnal  variability  and  evaluating  retrospectively  regenerated  L A  SSTs  to  support 
satellite  data  reprocessing  efforts  aimed  at  generating  Improved  SST  Gimate  Data  Records. 

©  201 2  Elsevier  Ltd.  All  rights  reserved. 


1.  Introduction 

Satellite-based  sea  surface  temperature  (SST)  products  have 
been  operationally  derived  from  low  earth  orbiting  (LEO)  and 
geostationary  (GEO)  platforms,  Initially  at  National  Environmen¬ 
tal  Satellite,  Data,  and  Information  Service  (NESDIS)  and  subse¬ 
quently  at  other  agencies  (e.&,  McClain  et  al„  1985;  Walton,  1988; 
Walton  et  al..  1998;  May  et  al..  1998;  Wu  et  aL,  1999;  Kilpatrick 
et  al.,  2001;  Brisson  et  al.,  2002;  Le  Borgne  et  al.,  2007;  Maturi 
et  al.,  2008).  Satellite  level  2  (L2;  data  at  the  observed  pixels) 
products  are  derived  from  level  IB  (LIB;  raw  data  with  appended 
calibration  and  Earth  location  Information)  brightness  tempera¬ 
tures  and  may  be  further  processed  Into  level  3  (L3;  gridded  data 
with  gaps)  products.  These  L2  and  L3  products  are  used  for  a 
variety  of  meteorological  and  oceanographic  applications,  but 
their  potential  Is  limited  due  to  data  gaps  caused  by  satellite  scan 
geometry,  cloud  coverage,  etc.  Therefore,  efforts  at  various  data 
centers  have  been  directed  towards  generating  global,  gridded, 
blended,  gap-free  SST  fields  with  attached  error  statistics,  known 
as  level  4  (L4)  SSTs.  In  addition  to  vanous  L2  SSTs  from  multiple 
sources,  many  L4  products  also  use  in  sifu  data,  and  blend  them 
together  using  various  Interpolation  techniques  (Martin  et  al., 
2012).  There  is  a  variety  of  real-time  and  research  applications 
requiring  global  L4  fields.  These  applications  include  seasonal  and 
short-term  weather  forecasting,  fisheries  and  coral-reef  monitor¬ 
ing  and  the  development  of  SST  retrieval  algorithms  employing 
radiative  transfer  simulations.  The  L4  SSTs,  in  particular  those 
with  a  longer  history,  are  invaluable  for  generating  Climate  Data 
Records  (CDRs,  defined  by  the  United  States  National  Research 
Council  as  "A  time  series  of  measurements  of  sufficient  length, 
consistency,  and  continuity  to  determine  climate  variability  and 
change.”).  Their  retrospective  and  near  real-time  analyses  are 
crucial  for  monitoring  climate  changes. 

In  order  to  satisfy  these  requirements  for  global  SST  informa¬ 
tion,  there  are  now  approximately  twenty  global  L4  products 
produced  worldwide.  This  poses  a  challenge  to  understand  their 
relative  merit  and  performance,  in  terms  of  data  coverage, 
resolution  and  accuracy.  To  assist  with  this  challenge,  we  have 
created  L4-SQUAM,  an  ML4  inventory"  with  comparison  tools  that 
can  help  users  to  choose  a  product  appropriate  for  their  applica¬ 
tion,  as  well  as  provide  feedback  to  data  producers  that  could  help 
them  Improve  their  products. 

In  producing  an  L4  SST,  the  goal  is  to  optimally  blend  SST  data 
from  different  sources  so  that  analysis  error  Is  minimized.  Despite 
this  objective,  inconsistencies  between  these  products  exist 
Differences  of  several  degrees  appear  regionally  between  various 
products,  particularly  at  high  latitudes,  In  the  vicinity  of  Western 
boundary  currents  and  in  semi-enclosed  basins,  e.g.,  the  Medi¬ 
terranean  Sea  and  the  Gulf  of  California.  The  time  series  of  global 


statistics  also  reveal  that  some  products  cluster  in  groups.  For 
example,  the  analyses  of  foundation  SST,  the  temperature  of  the 
water  column  at  a  depth  where  the  temperature  Is  free  of  diurnal 
variability  (Donlon  et  al..  2007),  tend  to  be  similar  while  sig¬ 
nificant  differences  may  be  observed  between  other  products. 
Such  differences  may  be  attributed  to:  (a)  developing  specific 
L4  SSTs  for  specific  applications,  depending  on  prevailing  require¬ 
ments  and  resources  In  corresponding  data  centers;  (b)  use  of 
different  input  data  (satellite  infrared,  microwave  and  in  sifu  SSTs) 
of  varying  space-time  resolutions,  quality,  cloud-masks,  and 
quality  control  (QC)  procedures;  (c)  use  of  different  blending 
and  optima]  Interpolation  methods  and  multiple  correlation 
lengths;  (d)  different  representations  of  SST  (skin,  depth,  founda¬ 
tion,  efc.)  and  feature  resolutions  and  (e)  non-uniform  treatment 
of  land-sea  and  Ice  masks. 

These  challenges  have  been  acknowledged  by  the  Group  for 
High  Resolution  SST  (CHRSST;  http://www.ghrsstorg/),  which 
formed  the  Inter-Comparison  Technical  Advisory  Group  (IC-TAG; 
https : //www.ghrssLorg/ghrsst-sdence/science-tea  m-groups/ic-tag/) 
to  facilitate  cross-evaluation  of  L4  SSTs.  Today,  the  IC-TAG 
comprises  three  major  near  real-time  web-based  systems:  the 
CHRSST  Multi  Product  Ensemble  (GMPE;  Part  1,  Martin  et  al„ 
2012),  the  Level-4  SST  Quality  Monitor  (U-SQUAM;  Part  2,  this 
study)  and  the  High-Resolution  Diagnostic  Data  Set  (HR-DDS; 
Donlon  et  a!.,  2009).  The  major  objective  of  Part  2  Is  to  document 
the  L4-SQUAM  system  and  illustrate  how  the  functionalities  of 
this  system  can  be  used  to  quickly  evaluate  the  consistency 
between  these  various  L4  fields. 

To  date,  thirteen  L4  fields  are  monitored  in  L4-SQUAM,  and 
work  Is  underway  to  include  the  remaining  fields  (see  Section 
2.1 ).  The  L4-SQUAM  is  an  extension  of  the  L2-SQUAM  described  in 
Dash  et  al.  (2010).  It  automatically  calculates  ”L4  minus  L4" 
differences  for  all  product  combinations,  within  *^24  h  of  their 
availability,  and  plots  global  maps,  histograms,  time  series  and 
Hovmdller  plots  of  SST  differences.  Also,  to  understand  the 
differences  between  ice  masks,  analyses  in  L4-SQUAM  are  per¬ 
formed  two  ways,  both  "including”  and  "excluding"  Ice  masks, 
when  corresponding  Ice  flags  are  available.  The  resulting  diag¬ 
nostics  are  posted  at  http://www.star.nesdis.noaa.gov/sod/sst/ 
squam/L4/.  The  primary  motivation  for  L4-SQUAM  was  near 
real-time  (NRT)  monitoring,  but  retrospective  diagnostics  are  also 
calculated  and  posted  on  the  web,  and  the  full  available  time 
series  are  analyzed  every  time  a  newer  product  is  included  in  the 
processing  stream. 

Besides  L4  cross-comparisons,  all  products  are  also  validated 
against  uniformly  quality  controlled  in  situ  data  available  from 
the  NESDIS  in  sifu  SST  Quality  Monitor  (iQUAM;  http://www.star. 
nesdis.noaa.gov/sod/sst/iquam/).  Most  L4  SSTs  include  in  sifu  data 
in  their  blending  methods  and  are  therefore  not  independent  of 
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these  data.  For  example,  a  less  accurate  analysis  that  gives  a  large 
weight  to  In  situ  data  will  agree  better  with  those  data  than  a 
more  accurate  analysis  that  gives  little  weight  to  in  situ  data.  Thus 
care  must  be  taken  when  interpreting  the  fit-to-data  statistics. 
However,  generating  consistent  validation  statistics  against  the 
same  data  provides  an  easy  way  to  compare  all  products.  Ideally, 
all  L4  products  should  be  produced  in  a  common  data  format  and 
conform  to  GHRSST  standards,  which  Includes  listing  data  sources 
used  to  produce  the  analysis.  This  would  allow  products  to  be 
validated  against  an  independent  data  source,  e.g.,  Argo  floats 
(e,g..  Part  1,  Martin  et  al.,  2012)  or  ship-borne  infrared  radio¬ 
meters  (Donlon  et  al.,  1998,  2011;  Minnett  et  al.,  2001).  Also,  the 
in  situ  drifters,  ships  and  buoys  data  which  are  not  Included  in  the 
blending  procedure  provide  a  ready  source  of  independent 
observations.  The  advantage  of  adding  Independent  Argo  data  to 
an  "in  situ  Inventory",  such  as  the  iQUAM,  has  been  recognized  by 
its  developers  (cf,  Xu  and  Ignatov,  2011)  and  will  be  explored  in 
the  future.  Unfortunately,  there  is  currently  no  publicly  available 
community-consensus  ship-borne  radiometer  dataset  for  use  In 
such  validation. 

The  paper  is  organized  as  follows:  Section  2  describes  the 
L4-SQUAM  concept,  system,  and  the  14  SST  fields  monitored  in  it 
Intercomparison  results  and  other  observations  are  discussed  in 
Section  3.  Potential  extensions  of  L4-SQUAM  are  explored  in 
Section  4  Section  5  summarizes  and  concludes  the  paper  and 
provides  an  outlook  for  the  future. 


2.  The  L4-SQUAM  concept  and  system 

The  assumption  for  L4  SQUAM  analyses  is  that  paired  differ¬ 
ences,  ATs=ML4/-l4/'  or  "14/ -in  siru",  are  approximately  cen¬ 
tered  about  zero  and  distributed  near  normally  (see  discussion 
for  L2-SQUAM  in  Dash  et  al.,  2010).  The  first  several  moments 
of  the  distribution  (mean,  standard  deviation,  skewness  and 
kurtosis)  are  used  as  a  measure  of  the  proximity  of  the  two 
products  and  monitored  In  L4-SQUAM. 

2A.  Daily  L4  SST  fields  monitored  in  L4 -SQUAM 

Currently,  the  following  daily  L4  SST  fields  arc  monitored  in 
L4-SQUAM:  two  NOAA  OISST  (AVHRR,  AVHRR +AMSR-E)  as 
described  in  Reynolds  et  al.  (2007),  referred  herein  as  AVHRRJ3I 
and  AVHRR.  AMSRJ3I,  respectively,  two  OSTIA  (operational  and 
retrospectively  reanalyzed),  two  RTG  (high  and  low  resolution, 
referred  herein  as  RTG^HR  and  RTG_LR,  respectively),  NAVO  K10, 
NESDIS  Multl-SST  analysis,  JPL  G1SST,  CMC  0.2s,  ODYSSEA,  BoM 
GAMSSA  and  GMPE  products.  Also.  JPL  MUR  and  RSS  MW  are 
being  processed  and  work  is  underway  to  include  the  remaining 
L4  products:  RSS  IR+MW.  NRL  NCODA,  JMA  MGDSST  and  DM1 
analyses  (see  Table  1  for  details).  Many  of  the  products  included 
in  L4 -SQUAM  are  also  included  in  GMPE  and  described  in  Part  1 
(Martin  et  al.,  2012).  However,  there  are  some  differences 
between  the  GMPE  and  L4-SQUAM  inputs.  The  products  mon¬ 
itored  in  L4-SQUAM  are  listed  in  Table  1. 

The  SST  products  listed  in  Table  1  comply  with  GHRSST 
standards  and  specifications  except  the  RTG  low  resolution 
product  As  per  the  GHRSST  specifications,  SSTs  arc  categorized 
Into  one  of  the  following  types:  interface,  skin,  sub-skin,  depth 
and  foundation  (Donlon  et  al..  2007).  Each  of  the  L4  SSTs  listed  In 
Table  1  are  designated  with  a  type  listed  above.  [Note  that  the 
Reynolds  and  RTG  SSTs  ore  adjusted  to  in  situ  SST  and  are  often 
referred  to  as  " bulk "  SSTs;  however,  this  term  is  not  recommended  by 
the  GHRSST.  Nevertheless,  " bulk ”  is  comporable  to  "depth”  SSTs 
which  occording  to  GHRSST  is  defined  as  measurements  beneath 
the  sub-skin,  such  as  from  drifting  buoys  and  vertical  profiling  floats. 


at  depths  ranging  from  10  2-10*  mf  The  OSTIA,  CMC,  GAMSSA, 
Cl  SST,  MUR,  RSS,  MGDSST,  ODYSSEA  and  DMI  products  are 
referred  to  as  "foundation  SSTs".  These  analysis  schemes  mini¬ 
mize  the  use  of  retrievals  affected  by  diurnal  variability  by 
employing  one  or  more  of  the  following  strategies:  (a)  using  only 
nighttime  satellite  data;  (b)  using  additional  daytime  data  with 
wind  speed  above  6  m-s“l  and  (c)  excluding  L2  SSTs  flagged  by 
the  producer  as  having  high  diurnal  variability.  The  input  data  to 
all  14  products  are  also  listed  in  Table  1,  along  with  information 
about  ice  masks  which  allow  the  user  to  interactively  exclude  Ice 
covered  grid  cells  from  statistical  analyses.  Some  products  have 
integrated  ice  Information  into  their  SST  fields  but  do  riot  provide 
a  separate  mask  to  identify  Ice-covered  grid  cells  (e,g.,  GMPE), 
whereas  other  products  have  been  produced  without  Ice  Informa¬ 
tion  (e.g.,  NAVO  K10).  Some  products  with  Integrated  ice  Informa¬ 
tion  did  not  provide  a  separate  mask  to  extract  ice-covered  cells 
In  the  beginning  but  added  it  at  a  later  stage  (e.g..  CMC  In 
September,  2011).  Also,  some  products  did  not  have  an  ice  mask 
included  in  the  Initial  stage  of  production,  but  subsequently 
added  it  (e,g.,  NESDIS  Multi-SST  analysis  in  May,  2010).  See 
Table  1  for  more  information. 

22.  Merging  procedure  in  L4-SQUAM  for  anolyses  of  SST  differences 

To  analyze  SST  differences,  SSTs  have  to  be  matched  up  In 
space  to  generate  14  pairs.  This  may  be  achieved  by:  (a)  averaging 
or  Interpolating  all  the  14  SSTs  into  a  common  grid  (GMPE 
approach),  (b)  interpolating  the  first  term  (I4i  In  ATs=L41-L42) 
to  the  resolution  of  the  second  term  (LA2\  using  various  linear  or 
cubic  formulations  or  inverse  distance-weighted  methods,  or, 
(c)  selecting  the  nearest  neighbor  (NN).  A  detailed  offline  study 
was  performed  for  an  extreme  combination  of  ultra-high  resolu¬ 
tion  G1SST  (0.01  )  and  low  resolution  RTG  (0.5°)  employing  both 
bilinear  interpolation  and  NN  approach.  Results  are  shown  In 
Fig.  1.  They  unambiguously  suggest  that  the  effect  of  the  inter¬ 
polation  scheme  on  the  global  comparison  statistics  is  negligible. 
(Note  that  this  global  result  may  not  be  valid  when  working 
in  highly  dynamic  regions.)  The  simpler  NN  approach  was  thus 
adopted  in  L4-SQUAM 

In  14- SQUAM,  analyses  are  performed  in  two  ways.  As  an 
example,  for  a  combination  of  OSTIA  and  CMC,  differences  are 
calculated  both  as  "OSTIA- CMC’  and  "CMC  -  OSTIA".  The  second 
term  Is  the  product  to  which  the  NN  matching  is  done  (i.e.,  CMC  in 
the  first  case  and  OSTIA  in  the  second).  As  a  result  of  differences  in 
the  spatial  interpolation,  the  comparison  statistics  may  slightly 
differ,  but  this  difference  is  always  small  as  expected  from  Fig.  1. 

3.  Comparisons  of  global  14  SST  fields  in  L4-SQUAM 

This  section  describes  the  four  types  of  diagnostics  currently 
implemented  In  L4-SQUAM.  Note  that  statistics  with  respect  to 
any  14  are  available  on  the  L4-SQUAM  webpage  and  the  ones 
used  here  are  for  illustration  only.  Also,  not  all  graphs  discussed  in 
this  paper  are  reproduced  here  (e,g.,  comparison  with  ship  and 
buoy  observations),  but  interested  readers  are  invited  to  view 
these  graphs  online. 

3.1.  Maps  and  histograms  of  ATS 

Fig.  2A  shows  an  example  map  of  AT?  between  two  foundation 
SSTs,  GAMSSA  and  OSTIA. 

Over  most  of  the  global  ocean,  ATS  is  close  to  zero.  However, 
the  differences  are  prominent  in  the  southern  oceans,  where 
GAMSSA  is  >  1  cC  warmer  with  respect  to  OSTIA  over  some 
regions,  and  in  the  Arctic,  where  the  magnitude  of  differences 
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NASA  JPL  G1SST  -  RTG(tow),  20100908 


G1SST  -  RTG_LR  (*C) 
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Fig.  1.  Effect  of  interpolation  on  merging  W  SST  fields  (0.01  ultra  high  resolution  Cl  SST  mi/tus  05  iat-lon  RTG).  Statistical  moments  are  annotated  on  the  histograms 
(see  Section  3.1  for  description).  Left  panels:  nearest  neighbor  selection  anchored  to  RTG;  Right  panels:  bilinear  interpolation  of  Cl  SST  to  RTG  grid. 


may  exceed  2  G  Differences  of  both  signs  are  also  observed  in 
many  coastal  locations.  Also,  different  combinations  of  L4s  show 
different  patterns  and  magnitudes  of  differences.  For  instance,  for 
13  July  2011,  AVHRR_OI  shows  highly  variable  differences  with 
respect  to  OSTIA  reaching  more  than  ±  1  "C  (not  shown)  in  many 
areas  of  the  global  ocean,  in  particular  where  CAMSSA  and  OSTIA 
appear  to  be  consistent. 

Fig.  2B  shows  a  histogram  of  the  differences  corresponding  to 
Fig.  2 A.  The  ATs  statistics  are  annotated,  including  the  number  of 
SST  pairs,  minimum,  maximum,  mean,  standard  deviation  (Std 
Dev),  median,  robust  standard  deviation  (RSD),  skewness  and 
kurtosis.  [ RSD  here  is  defined  as:  IQfi/S ,  where,  IQR  is  inferquarfi/e 
range  (75th  percentile — 25th  percentile,  in  an  ordered  dataset)  and  S 
is  a  scaling  factor  which  is  1348 for  an  ideal  normal  distribution,  cf.. 
Merchant  and  Harris,  1999/.  The  number  of  SST  pairs  approxi¬ 
mately  represents  the  number  of  valid  OSTIA  SSTs  because  NN 
matching  is  done  to  OSTIA  grid.  A  dotted  gray  line  shows  an  ideal 
Gaussian  fit,  X~~N(Median,  RSD),  Additionally,  numbers  of  SST 
pairs  beyond  "Median  ±4x  RSD"  are  shown  on  the  top  right  Note 
that  time  series  of  these  outliers  are  plotted  in  L4-SQUAM  but  not 
excluded  from  comparison  statistics.  Overall,  the  distribution  of 
ATS  is  close  to  Gaussian,  with  mean  and  median  close  to  zero,  and 
Std  Dev  -0.69  C  and  RSD  -0.36  °C 

The  difference  between  the  conventional  and  robust  statistics 
is  noticeably  high.  Indicating  the  large  effect  of  outliers.  A 
significant  negative  skewness  is  consistent  with  a  large  fraction 
of  negative  CAMSSA- OSTIA  outliers  found  largely  In  the  Arctic 
(Fig.  2A),  suggesting  differences  in  treatment  of  Ice  in  the  two 
products.  Both  L4  products  contain  ice  masks  which  are  derived 
from  different  ice  products.  The  bottom  panels  in  Fig.  2  re-plot  the 


top  panels,  but  with  ice -covered  grid  cells  excluded  when  ice  is 
reported  in  either  mask  or  both.  The  statistics  change  signifi¬ 
cantly.  First,  the  number  of  SST  pairs  is  reduced  by  —20%,  from 
-16.8  million  in  "all-grid”  to  -13.4  million  in  "ice-free"  ensem¬ 
ble.  In  the  removed  3.4  million  Ice  grid  points,  the  temperature 
was  likely  set  to  default  "melting  Ice"  -  -2  °C  in  at  least  one  of 
the  products.  There  are  grid  points  in  which  the  ice  cells  have  the 
same  values  for  both  products,  resulting  In  an  artificial  spike  at 
zero  In  Fig.  2B.  On  the  other  hand,  there  are  also  grid  cells  where 
one  product  reports  ice  and  the  other  does  not,  resulting  In  a  cold 
tail  in  the  histogram  and  a  somewhat  distorted  bell  curve  (an 
artificial  small  mode).  As  a  result,  the  mean  (AT5)  changes  from 
-0.07  °C  in  "all-grid"  to  +0.05  X  for  the  "ice-free"  sample,  and 
the  Std  Dev  is  reduced  from  -0.69  X  to  -059  G  However,  the 
apparent  worsening  of  skewness  (compare  Fig.  2B  with  Fig.  2D)  is 
related  to  Its  decrease  In  Fig.  2B,  caused  by  the  artificial  small 
mode  in  the  - 1.3  C  to  - 1.5  X  bins  (Fig.  2B).  Excluding  icy  pixels 
can  also  increase  the  Std  Dev  for  those  combinations  of  L4s  where 
the  assumed  value  of  SST  in  partially  ice-covered  regions  is  the 
same,  e,g.,  for  "AVHRROI  minus  AVHRR  AMSRJ3I"  (not  shown), 
due  to  excluding  many  grid  points  with  zero  ATs. 

The  shape  of  the  "ice-free"  histogram  is  more  regular  and 
symmetric,  and  shows  Improved  consistency  between  the  robust 
and  conventional  statistics.  Indicating  reduced  effect  of  outliers, 
consistent  with  their  reduced  fraction.  The  "ice-free"  analyses 
emphasize  product  comparison  in  the  physical  SST  domain, 
whereas  the  "all-grids"  analyses  should  assist  L4  producers  to 
diagnose  and  reconcile  different  Ice  masks.  Hence  both  analyses 
are  kept  in  L4-SQUAM  and  are  available  to  its  users  by  a  click  of  a 
button. 
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(B)  GAMSSA_28km- OSTIA,  20110713 


OAMSSA  -  OSTIA  (Q 


(D)  GAMSSA_28km  -  OSTIA,  20110713 


OAMSSA  -  OSTIA  fO) 


Fig.  2.  in  the  ieft  panels,  spatial  differences  between  CAMSSA  and  OSTIA  are  mapped  These  are  generally  dose  to  zero  but  prominent  in  some  areas,  e.g.,  roaring  forties 
and  in  many  coastai  locations.  The  arctic  ice  areas  aiso  show  significant  differences,  in  the  right  panels,  A Ts  statistics  are  annotated  on  the  ieft  side  of  the  histograms,  dotted 
gray  iine  shows  an  ideai  Gaussian  fit,  and  the  numbers  of  IA  match-ups  beyond  “Median  +  4x  Robust  Std  Dev'  are  shown  on  the  top  right.  Note  that  due  to  NN 
interpolation,  anchored  to  the  second  term  (£e.,  OSTIA),  the  number  of  SST  pairs  "N"  is  equal  or  ciose  to  the  number  of  vaiid  grid  ceils  in  OSTIA  Top-paneis:  ice  included  in 
the  analyses;  Bottom-pa neis:  ice  excluded.  (A)  GAMSSA  minus  OSTIA  ice  induded.  (B)  Frequency  distribution  corresponding  to  Fig.  2A  (C)  GAMSSA  minus  OSTIA  ice 
excluded  (D)  Frequency  distribution  corresponding  to  Fig.  2C 


32.  Time  series  of  ML4  minus  L4"  consistency  and  in  sifu  va/idarion 

The  statistical  parameters  annotated  on  the  ATS  histograms  are 
plotted  as  a  function  of  time  for  various  combinations  of  L4s  to 
monitor  products  for  relative  stability  and  consistency. 

Figs.  3A  and  B  show  examples  of  global  "ice-free"  mean 
differences  and  standard  deviations  in  14  fields  with  respect  to 
AVHRR.  01,  Figs.  3C  and  D  show  the  same  statistics  with  respect  to 
drifters  and  Figs.  3E  and  F  show  the  same  with  respect  to  GMPE. 

The  time  series  in  Fig.  3  are  very  busy  due  to  a  large  number  of 
L4  products.  However,  users  of  the  L4-SQUAM  webpage  can 
perform  interactive  analyses  by  plotting  and  focusing  on  time 
series  for  one  or  several  products.  It  is  also  possible  to  interac¬ 
tively  apply  a  time  filter  to  the  statistics  with  the  period  of  the 
filter  specified  by  the  user. 

The  majority  of  the  products  are  within  ±0.15CC  of  each 
other.  For  example,  the  two  daily  NOAA  OISST  products  are 
largely  consistent,  with  AVHRR  AMSR.OI  being  --f  0.05  °C  war¬ 
mer  than  AVHRR  01.  However,  there  are  a  few  noticeable  excep¬ 
tions.  For  instance,  Cl  SST  is  generally  colder  (between  +0.05  to 
-0.2  aC)  with  respect  to  AVHRR.  01.  Similarly,  a  cold  bias  relative 
to  AVHRR  01  is  also  seen  In  the  NESDIS  Multi-SST  analysis  and 
RTC  products  since  about  the  beginning  of  2010,  although  to 
varying  magnitudes  and  with  occasional  spikes.  Compared  to 


AVHRR^OI,  RTC_LR  was  a  little  warmer  until  6  January  2005,  after 
which  time  the  two  products  became  consistent  until  the  end  of 
2007,  and  then  RTG^LR  became  slightly  colder  than  AVHRR^OI. 
The  CMC  was  from  0.0  to  0.2  C  warmer  than  AVHRR,OI  until 
about  the  end  of  2004,  after  which  time  the  two  products  have 
shown  a  negligible  mean  bias.  Also,  a  pre-2006  trend  which 
flattens  out  subsequently  is  observed.  This  coincides  with  the 
change  in  input  of  NOAA  OISST  products  from  Pathfinder  to  Naval 
Oceanographic  Office  (NAVOCEANO)  SST  on  1  January  2006 
(Reynolds  et  al.,  2007). 

There  appears  to  be  clustering  of  some  products  into  groups.  For 
example,  the  RTC_HR  and  NESDIS  Multi-SST  analysis  products 
closely  follow  each  other.  Note  that  the  NESDIS  Multi-SST  analysis 
uses  a  ’'thinned”  RTGJ1R  for  bias  correction.  Similar  observations 
are  also  seen  for  the  foundation  SSTs,  with  GAMSSA  being  some¬ 
times  slightly  warmer  than  the  rest  of  the  foundation  family,  e,g, 
from  13  April  to  13  May,  2010  (Fig.  3A).  Shortly  after  its  start  In  early 
2006,  OSTIA  had  a  cold  mean  bias  of  --0.2  C  with  respect  to 
AVHRR  01,  which  reduced  to  -0.1  °C  later  in  2006  but  then  briefly 
spiked  again  in  February  2007,  May  2008  and  May  2009  /OSTIA 
reanalysis  excluding  ice  has  nor  been  processed  in  L4-SQUAM  yet  and 
consequently  is  not  shown  in  Figs.  3A  and  B;  work  is  underway  to  add  if). 

The  standard  deviations  with  respect  to  AVHRR  01  show  a 
clear  seasonal  cycle,  for  all  t A  products,  but  with  different 
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Fig.  3.  Mean  and  standard  deviation  of  AT*  Left-panels:  mean;  Right-panels:  standard  deviation.  Top-panels:  statistics  with  respect  to  Reynolds  (AVHRR)  excluding  ice  grids; 
Middle-panels:  same  as  top-panels  but  with  respect  to  drifters:  Bottom-panels:  same  as  top-panel  but  with  respect  to  GMPE.  (A)  Mean.  14  — Reynold*  AVHRR}".  ice  excluded.  (B) 
Std  Dev.  “14  Reynold* AVHRR r,  ice  excluded.  (C)  Mean.  “14  -Drifters"  (D)  Std  Dev,  “U  —  Drifters".  (E)  Mean.  “14  —GMPE",  ice  excluded.  (F)  Std  Dev,  14  -CMPE".  ice  excluded. 


amplitudes.  For  instance,  the  two  RTC,  C1SST,  and  ODYSSEA 
products  show  standard  deviations  between  -0.5  and  0.95  C 
For  OSTIA,  K10  and  GAMSSA,  with  respect  to  AVHRJLOI,  the 
standard  deviations  range  between  0.45  and  0.65  C  and  the 
NESDIS  Multi-SST  analysis  shows  slightly  higher  values.  The  two 
NOAA  OISST  products  are  very  consistent  A  clear  discontinuity  in 
the  Std  Dev  is  also  observed  for  ’’CMC  minus  AVHRR^OI”  and 
”RTG_LR  minus  AVHRR^OI"  on  January  1,  2007.  On  that  day,  the 
volume  of  satellite  data  used  by  AVHRR.  OI  effectively  doubled 
when  retrievals  from  NOAA- 18  were  added  to  the  retrievals  from 
NOAA-1 7  that  had  been  used  previously. 

L4-SQUAM  in  situ  validation  is  stratified  into  drifters,  ships, 
and  tropical  and  coastal  moorings,  following  the  four  major  in  situ 
data  types  available  in  \QUAM. 

Figs.  3C  and  D  show  global  mean  bias  and  standard  deviation 
in  L4  products  with  respect  to  drifters.  Many  of  the  observations 
in  Figs.  3A  and  B  are  also  reproduced  in  Figs.  3C  and  D.  but  with  a 
reduced  magnitude.  For  example,  ”RTG_HR  minus  AVHRR„OI”  Std 
Dev  ranges  between  0.5  and  0.95  X  with  strong  seasonality. 


whereas  for  MRTG_HR  minus  Drifters'*  it  ranges  between  035 
and  0.55  X .  It  is  also  observed  that  ”L4  minus  GMPE**  and  ”L4 
minus  Drifters”  show  remarkable  consistency  although  of  slightly 
different  magnitudes.  For  example,  Std  Dev  of  ”RTG„HR  minus 
GMPE”  ranges  between  035  and  0.5  C  and  shows  patterns 
similar  to  **RTG_HR  minus  Drifters”  (”RTG„HR  minus  GMPE”  is 
available  only  for  all-grids  as  neither  L4  provides  an  ice  mask). 
These  results  highlight  the  utility  of  GMPE  as  a  reference  field. 
(It  should  be  noted  that  drifter  SSTs  are  input  into  most  of  the  L4 
analyses  in  this  study,  see  Table  1).  This  result  is  consistent  with 
Part  1  (Martin  et  at,  2012)  which  has  shown  that  GMPE  has  lower 
errors  than  other  SST  analyses  when  compared  with  Argo  floats. 
However,  reprocessing  GMPE  back  in  time  is  needed,  to  extend 
the  time  coverage. 

Comparisons  against  ship  data  and  moorings  also  show  some 
noteworthy  characteristics,  not  shown  here  in  the  Interest  of  space. 
All  the  L4  SSTs  show  negative  differences  when  compared  to  ship 
data,  f.e.,  ship  records  are  warmer  due  to  engine  intake,  and  also 
show  much  stronger  seasonality  (c/,  Xu  and  Ignatov,  2011). 
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The  standard  deviations  with  respect  to  ship  data  are  also  much  the  plots  for  CMC  and  RTC  seem  to  be  anti-correlated  to  the  plots 

higher  ranging  from  0.75  to  13  °C  One  interesting  observation  in  shown  by  AVHRR  Ol  and  AVHRRj\MSR  01. 

the  "L4  minus  Ships"  mean  differences  is  seasonal  (sinusoidal)  Validation  statistics  against  coastal  moorings  also  vary  sig- 
pattems  of  comparable  amplitudes  but  different  signs.  For  example,  nificantly  between  different  products.  For  example,  the  Std  Dev 
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Fig.  4.  HovmOller  diagrams  of  average  zonal  differences:  First  column:  RTC  (high)-  Reynolds.  Ice  excluded:  Second  column:  RTC  (high)  -  Drifters;  Third  column:  Reynolds 
(AVHRR)  -  Drifters;  Top-panels:  mean  differences;  Bottom -panels:  standard  deviations. 
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approximately  ranges  from  035  to  0.8  C  for  AVHRROI  and  CMC. 
0.4  to  1.0  C  for  the  NESDIS  Multi-SST.  0.38-1.5  C  for  G1SST,  and 
0.6- 1.4  X  for  RTG,  K10,  GAMSSA.  ODYSSEA  and  GMPE.  There  was 
an  increase  in  the  Std  Dev  for  OSTIA  on  28  July  2009.  Prior  to  this 
date,  the  Std  Dev  ranged  between  0.2  and  0.6  °C*  but  after  this 
date,  the  Std  Dev  ranges  from  0.4  to  1.6  °C  The  reasons  for  these 
differences  are  not  fully  apparent  at  this  stage  and  should  be 
further  investigated. 

Although  some  pairs  of  products  show  a  close  to  zero  global 
mean  difference,  the  large  standard  deviation  suggests  significant 
regional  differences.  These  are  further  analyzed  in  L4- SQUAM 
using  Hovmdller  diagrams. 

3.3.  Hovmoller  diagrams 

Hovmdller  diagrams  provide  a  way  to  visualize  and  under¬ 
stand  zonal  time  series  evolution  of  ATs  and  detect  seasonal 
cycles  and  climatic  trends.  Fig.  4  (two  left  panels)  shows  example 
of  Hovmdller  diagrams  of  ice-free  mean  biases  and  standard 
deviations  for  ”RTG_HR  minus  AVHRR_OP. 

On  average,  the  two  SSTs  agree  well  everywhere  except  in  the 
high  latitudes  around  ^60  S  and  ^70°N,  where  large  persistent 
biases  and  seasonal  cycles  are  observed.  The  standard  deviations 
are  small  In  the  sub-tropical  oceans,  increasing  in  the  Inter- 
Tropical  Convergence  Zone  (ITCZ)  and  the  mid-latitudes,  and 
reaching  0.75-1  °C  from  40"N  to  75  N.  The  cause  of  these 
differences  is  not  fully  clear.  Recall  that  AVHRR^OI  uses  the 
NAVOCEANO  L2  SST  as  input,  whereas  RTG_HR  employs  a  unique 
physical  SST  retrieval  as  a  part  of  their  L4  production.  Similar 
patterns  are  observed  in  RTG_HR  and  NESDIS  Multi-SST  analysis 
products,  compared  to  any  other  14  products. 

To  better  understand  the  causes  of  these  differences,  mean 
biases  and  standard  deviations  of  MRTC_HR  minus  Drifters”  and 
"AVHRR  01  minus  Drifters”  are  also  plotted  in  Fig.  4.  Both  L4 
products  show  a  near  zero  mean  bias  in  the  full  domain  covered 
by  drifters.  The  large  ”RTG_HR  minus  AVHRILOP  mean  biases  and 
standard  deviations  are  not  captured  in  the  in  situ  validation, 
suggesting  that  they  exist  in  the  areas  not  covered  by  in  situ  data. 
On  the  other  hand,  in  areas  where  in  situ  data  are  present,  both 
L4s  agree  well  with  them.  This  suggests  that  drifters  are  assimi¬ 
lated  in  both  L4s  with  a  relatively  large  weight  and  illustrates  that 
using  the  same  in  situ  data  to  validate  an  L4  as  were  ingested  by 
the  L4  will  yield  an  unreliable  assessment  of  the  true  global 
performance  of  the  product 

Another  interesting  observation  includes  wanner  biases  in 
GAMSSA  over  the  Southern  Ocean  and  colder  biases  over  the  Arctic 
Ocean  compared  to  most  other  products  (not  shown).  Over  the 
Arctic  Ocean,  in  fact  most  of  the  products  show  distinctive  seasonal 
biases  with  respect  to  each  other  (not  only  GAMSSA).  Besides  the 
differences  In  sea-ice  treatment  discussed  in  Section  3.1,  these 
differences  may  also  be  attributed  to  different  bias  correction 
schemes  and  zonal  inconsistencies  between  input  L2  pre-processed 
(L2P)  products.  For  example,  the  GAMSSA  system  removes  biases  in 
the  Input  L2P  SSTs  on  a  global  basis  by  applying  the  bias  corrections 
provided  by  the  L2P  producers  (Cayula  et  aL,  2004).  In  contrast,  the 
Met  Office  uses  regional  AATSR  and  buoy  SSTs  to  estimate  and 
remove  the  biases  of  the  L2P  inputs  (Stark  et  aL,  2007).  The  Reynolds 
OISST  and  CMC  systems  adjust  all  satellite  Inputs  for  bias  on  a 
regional  basis  using  both  buoy  and  ship  SSTs  (Reynolds  et  aL,  2007; 
Brasnett,  2008).  Besides  differences  in  bias  corrections,  the  L2P 
inputs  also  show  significant  mutual  zonal  differences.  For  example, 
Reynolds  et  aL  (2010;  see  Fig.  5  therein)  showed  that  AVHRR  AATSR 
and  AMSR-E  SSTs  diverge  at  high  latitudes  as  well  as  over  the 
equator  when  referenced  to  AVHRR_AMSROI  SST.  Noticeably, 
NAVOCEANO  NOAA-17  and  -18  AVHRR  SSTs  are  warmer  over  the 
Southern  Ocean.  Similar  patterns  are  also  seen  from  comparisons 
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Year 

Fig.  5.  Average  "Day  minus  Night"  SST  differences  estimated  employing  double 
differencing  (DO)  technique,  with  daily  Reynolds  OISST  as  the  transfer  standard. 

between  NAVOCEANO  AVHRR  GAC  and  AVHRR  Ol  SSTs  (#, 
Hovmdller  diagrams  at  http://www^tarjiesdis.noaagov/sod/sst/ 
squam/NAVO/).  It  may  therefore  be  inferred  that  much  of  the  warm 
bias  between  GAMSSA  and  other  L4  products  over  the  Southern 
Ocean  and  mutual  inconsistencies  between  most  L4  SSTs  over  the 
Arctic  Ocean  can  be  mitigated  by  using  L2  SSTs  which  use  regional 
(zonal)  calibrations  and  provide  per-pixel  bias  estimates  based  on 
regional  in  situ  observations  (rather  than  global).  Ideally,  this  should 
form  a  goal  for  the  GHRSST  L2P  data  providers.  This  would  also 
reduce  the  need  for  analysis  systems  to  perform  their  own  bias- 
correction  of  satellite  data  and  allow  for  greater  consistency  between 
L4  products  and  towards  the  true  values. 


4.  Possible  extension  of  L4-SQUAM  analyses 

This  section  explores  potential  extensions  to  the  L4-SQUAM 
functionalities. 

4.1.  Diurnal-cycle  resolved  14  products 

All  L4  products  currently  monitored  in  SQUAM  are  updated  daily, 
and  do  not  resolve  the  diurnal  cycle.  Some  L4  developers  have 
started  generating  diumal  cycle  resolved  L4  products  (e^g.,  BoM  and 
NCODA  produce  3-hourly  experimental  and  6-hourly  operational 
products,  respectively).  Modeling  of  diumal  variation  (DV)  may  have 
various  degrees  of  complexity  and  accuracy,  depending  on  methods 
of  accounting  for  solar  insolation  and  its  propagation  in  the  top  few 
meters  of  the  ocean  water  ( cf.%  Stuart-Menteth  et  al.*  2005; 
Gentemann  et  aL*  2007;  Donlon  et  al.*  2007;  Kennedy  et  al„  2007). 
Note  that  the  original  Global  Ocean  Data  Assimilation  Experiment 
(GODAE)  requirements  for  GHRSST  L4  products  are  1 0  km  resolution 
and  0J>  °C  accuracy,  available  every  6  h  [cf. ,  Donlon  et  al„  2009). 
Therefore,  one  could  expect  that  the  recent  trend  towards  finer  time 
resolution  L4  products  will  continue,  and  L4-SQUAM  will  need  to  be 
adjusted  accordingly  to  report  and  monitor  such  L4  products.  For 
example,  to  compare  L4  products  of  different  update  cycles,  such  as 
24-hourly  products  against  6-houriy  products,  one  way  is  to 
calculate  a  daily  average  of  the  product  with  shorter  update  cycle. 
Comparison  of  products  with  similar  update  cycles  is  achievable 
with  the  current  L4-SQUAM  framework. 

Analyses  by  Dash  et  al.  (2010)  suggest  that  one  could  validate  the 
DV  models,  by  combining  satellite  L2  products  with  L4  SSTs.  Toward 
that  objective,  a  double-differencing  (DD)  technique  was  implemen¬ 
ted  In  L2 -SQUAM.  In  particular.  Day  -  Night  (DN)  DDs  are  calculated 
as  follows  DN={TSd-Tr)-(Tsn-Tr)  ^  Tso-T^.  where  T$d  and  Tsn  are 
daytime  and  nighttime  satellite  L2  SSTs,  and  TR  is  the  L4  “reference" 
SST  which  is  used  here  as  a  "transfer  standard".  Note  that  DN 
differences  can  also  be  calculated  by  direct  differencing  of  the 
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respective  12  products,  but  this  can  only  be  done  in  a  sub-sample  of 
the  global  data  domain,  where  both  day  and  night  retrievals  are 
available  at  the  same  locatioa  However,  the  DD  technique  allows 
substantial  extension  of  the  comparison  domain  (by  including  even 
those  points  where  either  day  or  night  L2  SST  is  unavailable)  and 
makes  the  comparison  more  stable  and  consistent  from  day  to  day. 
More  discussion  is  found  in  Dash  et  aL  (2010). 

Fig.  5  shows  an  example  DN  time  series  for  four  AVHRR 
sensors,  generated  by  the  NESDIS  heritage  SST  system,  using 
AVHRR_OI  as  the  transfer  standard. 

The  DN  values  are  mostly  positive,  because  day  SST  Is  wanner 
than  night  SST.  In  calculation  of  DDs,  the  AVHRR  L2  SST  product  is 
subject  to  diurnal  changes  but  AVHRR.OI  SST  field  has  one  daily 
value  per  grid  which  cancels  out.  As  expected,  the  afternoon  plat¬ 
forms.  NOAA-1 8  and  -19,  which  pass  at  - 1 :30  am/pm,  show  higher 
DN  values  than  the  morning  platforms.  NOAA-1 7  and  Metop-A, 
which  overpass  at  -10  am/pm.  (A  systematic  residual  offset 
between  NOAA-18  and  -19  of  -0.10  C  Is  likely  due  to  the  empirical 
setting  of  regression  coefficients  in  NESDIS  L2  production  and  not 
from  the  DV  physics.  Work  Is  underway  to  understand  and  remove 
this  bias.)  Using  a  diumally  resolved  L4  as  a  transfer  standard  in  the 
DD  technique  should  compensate  for  the  diurnal  differences 
observed  in  the  L2  product,  and  make  the  DN  time  series  flat  and 
close  to  zero.  Thus  calculation  of  DN  differences  using  DD  technique, 
with  various  diumally  resolved  L4  products  employing  different  DV 
models,  provides  an  assessment  of  global  performances  of  the 
diumally  resolved  L4  products. 

Likewise,  any  external  DV  model  can  also  be  validated  using 
this  technique  by  applying  the  model  for  removing  the  diurnal 
variations  from  L2  SSTs,  or  by  adding  DV  amounts  on  the  top  of 
the  "daily"  L4  field  and  then  recalculating  the  DN  DDs.  These 
analyses  are  the  subject  of  future  work  and  will  contribute  to  the 
GHRSST  DV  working  group  activities  (https://www.ghrsst.org/ 
ghrsst-science/science-team-groups/dv-wg/). 

42.  Dependencies 

The  SST  differences  may  also  be  plotted  as  a  function  of 
geophysical  conditions,  eg,  latitude,  proximity  to  the  coast  and 
bathymetry.  Such  "dependencies"  plots  are  helpful  to  stratify  the 
differences  and  focus  on  domains  with  the  largest  differences. 
Examples  of  wind  speed  dependencies  are  shown  in  Fig.  6  for 
"MUR  minus  GMPE”  and  "CMC  minus  GMPE". 

Both  MUR  and  CMC  are  foundation  SST  products.  Comparisons 
within  L4-SQUAM  indicate  that  the  GMPE  provides  a  good  average 
representation  of  the  foundation  family.  It  Is  thus  expected  that 
these  products  should  be  consistent  in  the  full  range  of  wind  speeds. 
Indeed,  there  is  a  high  degree  of  consistency  between  MUR,  CMC 
and  GMPE.  However,  the  corresponding  ATS  vary  across  the  wind 
speed  range,  with  product-specific  amplitudes.  For  example,  at  low 
winds  MUR  is  colder  than  GMPE  by  0.1  °C  whereas  at  high  winds  it 
is  warmer  by  0.1  Under  low  wind  conditions,  this  may  be 
attributed  to  a  cool-skin  effect,  MUR  being  a  satellite-only  product 
(no  in  situ ;  see  Table  1.  Later  versions  of  MUR  will  use  in  situ  data.), 
which  reduces  with  increasing  wind  speed.  The  corresponding 
standard  deviations  are  largest  at  low  winds  (-05  °C)  and  decrease 
towards  larger  winds  reaching  -035-0.40  C  The  CMC  product 
shows  similar  trends  but  with  lesser  magnitudes.  Including  such 
dependencies  in  SQUAM  and  verifying  over  a  longer  time  series  will 
help  to  better  understand  the  cause  of  these  residual  biases. 

4.3.  Correlograms 

Another  potential  extension  of  L4-SQUAM  is  adding  autocor¬ 
relation  analyses  ( cf..  Box  and  Jenkins,  1976).  The  autocorrelation 
of  the  time  series  is  defined  as  a  lagged  correlation  between  the 


Fig.  6.  Dependence  of  MJPl  MUR  -  GMPE"  and  “CMC  0.2  -  GMPE"  AT$  on  wind- 
speed.  Top-panel:  dependence  of  mean  Ars:  Middle-panel:  dependences  of  ATj 
standard  deviations;  Bottom-panel:  Distribution  of  wind-speed  to  check  where 
distributions  are  statistically  relevant 


same  variable  measured  at  two  different  times  (days),  x,  and 
and  Is  used  to  detect  non-randomness  In  the  time  series. 
The  autocorrelation  coefficient  V  for  lag  "k"  Is  calculated  as 
r*  The  V  vs.  "k"  for  time  series 

biases  and  standard  deviations  in  ML4  minus  drifters"  are  shown  in 
Fig.  7  (upper  and  middle  panels),  respectively. 

In  general,  If  day-to-day  variations  in  "L4  minus  drifters"  mean 
biases  and  standard  deviations  are  random  then  the  error  In  the  L4 
field  has  no  "memory"  and  V  would  be  close  to  zero.  Deviation  of 
mr  from  zero  can  be  used  as  a  measure  of  this  memory.  Both  Figs.  7 
upper  and  middle  panels  show  that  autocorrelations  are  positive 
and  very  strong  for  the  first  several  days  and  then  decay  exponen¬ 
tially.  However,  the  magnitudes  of  "r"  can  be  significantly  different 
for  different  L4  SSTs,  and  also  between  mean  bias  and  Std  Dev  for  a 
given  product.  For  example,  in  Fig.  7  upper  panel,  OSTIA  shows  the 
lowest  and  RTG_HR  shows  the  highest  "randomness",  whereas  in 
Fig.  7  middle  panel,  AVHRR.  Ol  and  OSTIA_RAN  show  lowest  and 
GAMSSA  and  ODYSSEA  show  highest  "randomness".  Fig.  7  upper 
panel  suggests  that  the  bias  in  some  fields,  eg.,  OSTIA,  are  rather 
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Temporal  correkxgram  of  global  mean  biases  with  respect  to  drifters 


Temporal  correlogram  of  global  standard  deviations  with  respect  to  drifters 


Number  of  matchups  with  respect  to  IQUAM  Drifters 


Fi*.  7.  Correlograms  for  daily  time  series  data  of  "L4  mfnus  Drifters".  Top-panel: 
autocorrelation  coefficients  of  mean  biases;  Middie-panei:  same  as  in  top-panei 
but  for  standard  deviation;  Bottom-panel:  number  of  match-ups.  For  top  and 
middle  panels,  X-axis  shows  time  tag  in  days  (k*0.  t.  2.  ....  30). 

smooth  and  consistent  with  respect  to  drifters  whereas  for  some 
fields  (e£t  RTC_HR)  they  are  noisier.  When  interpreted  appro¬ 
priately,  the  shape  of  such  a  correlogram  can  be  used  as  an 
Indicator  of  the  closeness  of  the  l A  SST  to  drifters  for  each  product, 
and  also  of  the  level  of  persistence  of  the  background  field, 
globally.  It  should  be  noted  that  Interpretation  of  such  "prelimin¬ 
ary"  conceptual  plots  are  best  performed  in  conjunction  with 
validation  time  series  and  spatial  autocorrelation  maps.  For  exam¬ 
ple.  an  L4  with  high  V*  but  consistent  low  bias  and  Std  Dev  might 
indicate  a  higher  quality  as  opposed  to  an  L4  with  low  but 
consistent  high  bias  and  Std  Dev. 


5.  Summary  and  future  work 

The  web-based  L4  SST  quality  monitor  (L4-SQUAM)  has  been 
developed  at  NOAA  NESDIS  to  monitor  global  L4  SST  fields,  in  near 


real-time  and  retrospective  modes.  The  L4-SQUAM  is  comple¬ 
mentary  to  the  two  other  existing  systems  of  the  IC-TAG:  the 
GHRSST  Multi  Product  Ensemble  (GMPE;  Martin  et  al„  2012)  and 
the  High  Resolution  Diagnostic  Data  Set  (HR-DDS,  Donlon  et  ah, 
2009).  To  date,  thirteen  daily  L4  SSTs  are  monitored  in  L4-SQUAM, 
with  two  additions  underway  and  four  others  planned. 

L4-SQUAM  metrics  are  based  on  analyses  of  "L4  minus  L4"  and 
"L4  minus  in  situ"  AT*.  Maps  and  Hovmdller  plots  provide  synoptic 
snapshots  of  similarities  and  differences  between  various  pro¬ 
ducts,  histograms  check  for  their  proximity  to  a  Gaussian  shape, 
and  time  series  assess  relative  stability  of  consistency  statistics. 
To  better  interpret  the  effect  of  Ice  masks  on  these  L4  products, 
analyses  are  performed  in  two  ways:  (a)  including  ice  and 
(b)  excluding  ice,  when  the  corresponding  information  to  extract 
an  ice- mask  Is  available  within  the  product  All  processing  Is  done 
automatically  within  24  h  of  data  availability  to  the  system. 

Our  results  show  that  foundation  SSTs  are  more  consistent 
with  each  other  whereas  some  depth-SSTs  show  persistent  zonal 
differences.  The  differences  are  often  more  pronounced  in  high 
latitudes,  associated  with  ice  and  sparse  data  coverage  in  both 
satellite  and  in  situ  data ,  and  in  coastal  areas.  However,  large 
differences  also  exist  In  the  open  oceans.  Our  analyses  also 
emphasize  the  need  for  diurnal ly  resolved  L4  SSTs,  and  their 
global  validation. 

We  note  that  an  SST  analysis  is  designed  to  produce  the  best 
estimate  of  SST  for  a  given  time  and  location,  over  a  regular  grid 
based  on  irregularly  spaced  sparse  measurement  datasets  (</., 
Donlon  et  al„  2011).  The  specified  grid  resolution  of  a  product 
defines  the  smallest  possible  SST  features  that  could  potentially  be 
resolved  by  the  analysis,  but  grid  resolution  is  often  not  the  same  as 
the  end-to-end  analysis-system  resolution.  The  length  of  analysis 
time  window,  during  which  the  input  data  sets  are  considered 
"coincident",  also  varies  among  the  L4  products.  The  design  of 
analysis  parameters  and  the  variable  characteristics  of  the  inputs 
to  the  analysis  thus  determine  the  end-to-end  resolution.  An 
analysis  may  be  smoother  than  the  output  grid  resolution  depend¬ 
ing  on  the  choice  of  background  error  covariance  and  correlation 
length  scale.  Such  a  product  may  be  perfectly  adequate  for  its 
intended  application  (e&,  for  numerical  weather  prediction  systems 
where  "noisy"  features  cause  instabilities  in  the  model)  but  inade¬ 
quate  for  another  application  (e,g.,  monitoring  of  SST  frontal 
dynamics).  Coarse  resolution  input  data  (e.&,  passive  microwave 
SST  at  ~50km  resolution)  cannot  resolve  fine  mesoscale  features 
unlike  the  infrared  satellite  sensors,  and  an  analysis  dominated  by 
the  former  cannot  be  expected  to  deliver  high  resolution  output 
Thus  high-resolution  SST  features  within  each  LA  product  are 
dynamic  and  are  visible  only  when  sufficient  SST  data  are  available 
for  input:  persistent  lack  of  data  results  in  the  analysis  system 
reverting  to  a  smooth  background  climatological  value,  until  new 
data  are  again  ingested.  Users  of  SST  analysis  products  must  be 
aware  that  the  representation  of  reality  of  every  analysis  system  on 
a  given  day  is  extremely  dynamic  (Donlon  et  al„  2011).  The 
challenge  for  all  analysis  systems  is  to  maximize  the  signal  to  noise 
ratio  for  a  given  output  grid  resolution  while  maintaining  the 
highest  feature  resolution  possible  through  the  careful  choice  of 
analysis  design  for  a  given  application. 

The  SQUAM  system  provides  a  tool  that  reveals  differences 
between  many  operational  analyses  and  is  a  significant  step  towards 
understanding  such  differences.  Maps,  histograms  and  time  series 
plots  for  all  combinations  of  "b4  minus  L4"  for  all  available  dates  are 
made  available  for  users  in  an  easy  to  use  web-based  interface  at 
http://www.star.nesdis.noaa.gov/sod/sst/squam/L4/.  Having  L4  SSTs 
uniformly  analyzed  and  compared  to  the  same  in  situ  data  using  a 
single  interface  allows  L4  SQUAM  to  provide  users  and  producers  of 
L4  products  with  valuable  information  on  availability,  relative  merit 
for  particular  applications  and  potential  areas  of  improvement  of 
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these  products.  We  emphasize  that  it  Is  not  the  purpose  of  this 
paper  to  determine  which  data  set  is  the  “best"  or  select  “one" 
product  suitable  for  all  applications:  it  is  a  user  choice  to  decide 
which  product  is  better  suited  to  their  applications  based  on 
diagnostics  from  GMPE,  L4-SQUAM  and  HR-DDS.  Rather,  we  view 
the  L4-SQUAM  as  a  “best  practice"  for  intercomparison,  which  does 
provide  diagnostics  to  Identify  potential  issues  in  any  given  product 
For  example,  if  one  product  deviates  from  the  majority  of  the 
products  for  any  given  region,  it  is  more  likely  that  the  problem  is 
in  the  deviant  product. 

For  all  the  L4  SST  products,  we  make  the  following  recom¬ 
mendations:  (a)  sea-ice  information  and  corresponding  masks 
that  separate  ice-covered  grid  cells  from  open  water  should  be 
provided  (ideally  as  sea  ice  concentration  and  sea  ice  edge); 
(b)  Independent  reference  datasets  (f.e.,  data  not  assimilated  into 
L4  systems)  should  be  maintained  such  as  the  surface  SST  values 
derived  from  Argo  floats  and  un-ingested  in  siru  data  for  consis¬ 
tent  validation  of  products  and  (c)  operational  14  SST  data 
products  should  be  reprocessed  to  provide  consistent  outputs 
for  diagnostic  and  other  scientific  applications. 

Future  work  to  improve  the  14-SQUAM  system  includes  tools  to 
estimate  the  individual  contribution  of  a  given  product  to  the 
observed  differences.  This  may  be  achieved  by  employing  a  three- 
way  error  analysis,  recently  applied  by  0‘Carroll  et  al.  (2008) 
allowing  individual  errors  for  a  given  combination  of  three  datasets 
to  be  derived  within  SQUAM  (assuming  products  have  mutually 
independent  errors  required  for  a  three-way  error  analysis).  Pro¬ 
duct-specific  spatial  error  fields  may  be  derived,  rather  than  a  single 
global  mean  (cf,  Xu  and  Ignatov  (201 1),  who  explored  derivation  of 
error  fields  using  Pathfinder  SST,  AVHRR  01  and  in  situ  data).  Time- 
averaged  L4  SST  differences,  e.g.,  monthly  mean  difference  maps, 
may  also  be  useful  for  identifying  persistent  and  seasonal  features, 
as  has  been  suggested  by  some  L4  producers.  Potential  use  of 
dependence  plots  was  also  discussed  in  this  study.  As  the  SQUAM 
system  matures,  these  features  will  be  considered  as  part  of  the 
long-term  effort  to  provide  users  and  producers  of  L4  SST  products 
with  effective  and  useful  diagnostic  tools  that  facilitate  improve¬ 
ment  of  these  products  and  their  applications. 
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